Using phone and diphone based acoustic models for voice conversion: a step towards creating voice fonts

نویسندگان

  • Arun Kumar
  • Ashish Verma
چکیده

Voice conversion techniques attempt to modify speech signal so that it is perceived as if spoken by another speaker, different from the original speaker. In this paper, we present a novel approach to perform voice conversion. Our approach uses acoustic models based on units of speech, like phones and diphones, for voice conversion. These models can be computed and used independently for a given speaker without being concerned about the source or target speaker. It avoids the use of a parallel speech corpus in the voices of source and target speakers. It is shown that by using the proposed approach, voice fonts can be created and stored which will represent individual characteristics of a particular speaker, to be used for customization of synthetic speech. We also show through objective and subjective tests, that voice conversion quality is comparable to other approaches that require a parallel speech corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems

This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...

متن کامل

Voice GMM modelling for FESTIVAL/MBROLA emotive TTS synthesis

Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a processing framework for voice transformations finalized to the analysis and synthesis of emotive speech. We use a GMM-based model to compute the differences between an MBROLA voice and an anger voice, and we address the modification of the...

متن کامل

Modeling speaking rate for voice fonts

Voice fonts are created and stored for a speaker, to be used to synthesize speech in the speaker’s voice. The most important descriptors of voice fonts are spectral envelope for acoustic units and prosodic features such as fundamental frequency and average speaking rate. In this paper, we present a new approach to model the speaking rate so that it can be easily incorporated in voice fonts and ...

متن کامل

The Study of Vocal Function in Patients With Early Laryngeal Carcinoma After Transoral Laser Microsurgery

Objective Today transoral laser microsurgery is considered as one of the first options to control early laryngeal cancer, and voice disorder is one of the inevitable complications of this therapeutic component. This study aimed to compare the vocal function in patients with early-stage laryngeal cancer following laser surgery with healthy individuals with normal voice quality using acoustic ana...

متن کامل

Acoustic Voice Measures in Benign Mass Lesions

Objectives: The present study aims to compare acoustic voice parameters in patients with vocal cord nodules, polyps, and normal subjects.  Methods: In this cross-sectional case-control study, the participants were selected by convenience sampling, including 30 patients with vocal polyps group, 38 patients with vocal nodules for the second group, and 42 participants without voice pathologies a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003